Tool for Czech Pronunciation Generation Combining Fixed Rules with Pronunciation Lexicon and Lexicon Management Tool
نویسندگان
چکیده
This paper presents two different tools which may be used as a support of speech recognition. The tool “transc” is the first one and it generates the phonetic transcription (pronunciation) of given utterance. It is based mainly on fixed rules which can be defined for Czech pronunciation but it can work also with specified list of exceptions which is defined on lexicon basis. It allows the usage of “transc” for unknown text with high probability of correct phonetic transcription generation. The second part is devoted to lexicon management tool “lexedit” which may be useful in the phase of generation of pronunciation lexicon for collected corpora. The presented tool allows editing of pronunciation, playing examples of pronunciation, comparison with reference lexicon, updating of reference lexicon, etc.
منابع مشابه
Orthographic and Phonetic Annotation of Very Large Czech Corpora with Quality Assessment
The annotation is generally indivisible part of speech database. In this paper we are presenting common orthographic and phonetic annotation of large Czech databases. Phonetic annotation may be very important and gives more information than pronunciation lexicon with possible pronunciation variants. Moreover, for Czech language phonetic annotation means just small additional effort to standard ...
متن کاملAutomatic generation of domain-dependent pronunciation lexicon with data-driven rules and rule adaptation
In this paper, we describe a method for automatically generating a domain-dependent pronunciation lexicon using a data-driven approach. We also introduce an adaptation method to alleviate some of the errors caused by the data-driven rules which are derived from a relatively small volume of speech corpus. At first, pronunciation variation rules are extracted from a large volume of speech corpus ...
متن کاملImpact of Irregular Pronunciation on Phonetic Segmentation of Nijmegen Corpus of Casual Czech
This paper describes the pilot study of phonetic segmentation applied to Nijmegen Corpus of Casual Czech (NCCCz). This corpus contains informal speech of strong spontaneous nature which influences the character of produced speech at various levels. This work is the part of wider research related to the analysis of pronunciation reduction in such informal speech. We present the analysis of the a...
متن کاملMultiple-Pronunciation Lexical Modeling Based on Phoneme Confusion Matrix for Dysarthric Speech Recognition
In this paper, we propose speaker-dependent multiple-pronunciation lexical modeling for improving the performance of dysarthric automatic speech recognition (ASR). For each dysarthric speaker, a phoneme confusion matrix is first constructed from the results of phoneme recognition. Then, pronunciation variation rules are extracted by investigating the phoneme confusion matrix, and they are incor...
متن کاملModeling Cross-morpheme Pro for Korean Large Vocabulary Cont
In this paper, we describe a cross-morpheme pronunciation variation model which is especially useful for constructing morpheme-based pronunciation lexicon for Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. Since phonemic context together with morphological category and morpheme boundary information affect Korean pronunciation var...
متن کامل